2 research outputs found
A Study of Geometric Semantic Genetic Programming with Linear Scaling
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine Learning (ML) is a scientific discipline that endeavors to enable computers
to learn without the need for explicit programming. Evolutionary Algorithms (EAs),
a subset of ML algorithms, mimic Darwin’s Theory of Evolution by using natural
selection mechanisms (i.e., survival of the fittest) to evolve a group of individuals
(i.e., possible solutions to a given problem). Genetic Programming (GP) is the most
recent type of EA and it evolves computer programs (i.e., individuals) to map a set of
input data into known expected outputs. Geometric Semantic Genetic Programming
(GSGP) extends this concept by allowing individuals to evolve and vary in the semantic
space, where the output vectors are located, rather than being constrained by syntaxbased
structures. Linear Scaling (LS) is a method that was introduced to facilitate the
task of GP of searching for the best function matching a set of known data. GSGP
and LS have both, independently, shown the ability to outperform standard GP for
symbolic regression. GSGP uses Geometric Semantic Operators (GSOs), different
from the standard ones, without altering the fitness, while LS modifies the fitness
without altering the genetic operators. To the best of our knowledge, there has been
no prior utilization of the combined methodology of GSGP and LS for classification
problems. Furthermore, despite the fact that they have been used together in one
practical regression application, a methodological evaluation of the advantages and
disadvantages of integrating these methods for regression or classification problems
has never been performed. In this dissertation, a study of a system that integrates both
GSGP and LS (GSGP-LS) is presented. The performance of the proposed method, GSGPLS,
was tested on six hand-tailored regression benchmarks, nine real-life regression
problems and three real-life classification problems. The obtained results indicate that
GSGP-LS outperforms GSGP in the majority of the cases, confirming the expected
benefit of this integration. However, for some particularly hard regression datasets,
GSGP-LS overfits training data, being outperformed by GSGP on unseen data. This
contradicts the idea that LS is always beneficial for GP, warning the practitioners about
its risk of overfitting in some specific cases.A Aprendizagem Automática (AA) é uma disciplina científica que se esforça por
permitir que os computadores aprendam sem a necessidade de programação explícita.
Algoritmos Evolutivos (AE),um subconjunto de algoritmos de ML, mimetizam a Teoria
da Evolução de Darwin, usando a seleção natural e mecanismos de "sobrevivência dos
mais aptos"para evoluir um grupo de indivíduos (ou seja, possíveis soluções para
um problema dado). A Programação Genética (PG) é um processo algorítmico que
evolui programas de computador (ou indivíduos) para ligar características de entrada e
saída. A Programação Genética em Geometria Semântica (PGGS) estende esse conceito
permitindo que os indivíduos evoluam e variem no espaço semântico, onde os vetores
de saída estão localizados, em vez de serem limitados por estruturas baseadas em
sintaxe. A Escala Linear (EL) é um método introduzido para facilitar a tarefa da PG de
procurar a melhor função que corresponda a um conjunto de dados conhecidos. Tanto
a PGGS quanto a EL demonstraram, independentemente, a capacidade de superar a
PG padrão para regressão simbólica. A PGGS usa Operadores Semânticos Geométricos
(OSGs), diferentes dos padrões, sem alterar o fitness, enquanto a EL modifica o fitness
sem alterar os operadores genéticos. Até onde sabemos, não houve utilização prévia
da metodologia combinada de PGGS e EL para problemas de classificação. Além disso,
apesar de terem sido usados juntos em uma aplicação prática de regressão, nunca foi
realizada uma avaliação metodológica das vantagens e desvantagens da integração
desses métodos para problemas de regressão ou classificação. Nesta dissertação, é
apresentado um estudo de um sistema que integra tanto a PGGS quanto a EL (PGGSEL).
O desempenho do método proposto, PGGS-EL, foi testado em seis benchmarks de
regressão personalizados, nove problemas de regressão da vida real e três problemas
de classificação da vida real. Os resultados obtidos indicam que o PGGS-EL supera
o PGGS na maioria dos casos, confirmando o benefício esperado desta integração.
No entanto, para alguns conjuntos de dados de regressão particularmente difíceis, o
PGGS-EL faz overfit aos dados de treino, obtendo piores resultados em comparação com
PGGS em dados não vistos. Isso contradiz a ideia de que a EL é sempre benéfica para
a PG, alertando os praticantes sobre o risco de overfitting em alguns casos específicos
An Investigation of Geometric Semantic GP with Linear Scaling
Nadizar, G., Garrow, F., Sakallioglu, B., Canonne, L., Silva, S., & Vanneschi, L. (2023). An Investigation of Geometric Semantic GP with Linear Scaling. In GECCO’23: Proceedings of the 2023 Genetic and Evolutionary Computation Conference (pp. 1165-1174). Association for Computing Machinery (ACM). https://doi.org/10.1145/3583131.3590418 --- Funding: This work was partially supported by FCT, Portugal, through funding of research units MagIC/NOVA IMS (UIDB/04152/2020) and LASIGE (UIDB/00408/2020 and UIDP/00408/2020). We also wish to thank the SPECIES Society and Anna Esparcia-Alcázar for organizing the SPECIES Summer School 2022, which brought us together and gave us the chance to start this collaborationGeometric semantic genetic programming (GSGP) and linear scaling (LS) have both, independently, shown the ability to outperform standard genetic programming (GP) for symbolic regression. GSGP uses geometric semantic genetic operators, different from the standard ones, without altering the fitness, while LS modifies the fitness without altering the genetic operators. So far, these two methods have already been joined together in only one practical application. However, to the best of our knowledge, a methodological study on the pros and cons of integrating these two methods has never been performed. In this paper, we present a study of GSGP-LS, a system that integrates GSGP and LS. The results, obtained on five hand-tailored benchmarks and six real-life problems, indicate that GSGP-LS outperforms GSGP in the majority of the cases, confirming the expected benefit of this integration. However, for some particularly hard datasets, GSGP-LS overfits training data, being outperformed by GSGP on unseen data. Additional experiments using standard GP, with and without LS, confirm this trend also when standard crossover and mutation are employed. This contradicts the idea that LS is always beneficial for GP, warning the practitioners about its risk of overfitting in some specific cases.authorsversionpublishe